Loading Models - OpenCLIP

OpenCLIP provides flexible model loading with support for pretrained weights, custom configurations, and multiple storage backends.

Basic Model Loading

create_model()

The core function for creating CLIP models with flexible configuration options.

import open_clip

model = open_clip.create_model(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    device='cuda',
    precision='fp16'
)

model_name

str

required

Model architecture name (e.g., ‘ViT-B-32’, ‘RN50’) or schema-prefixed path:

Built-in: 'ViT-B-32'
HuggingFace Hub: 'hf-hub:org/repo'
Local directory: 'local-dir:/path/to/model'

pretrained

str

Pretrained weights source. Can be:

Tag name (e.g., ‘openai’, ‘laion2b_s34b_b79k’)
Local file path (e.g., ‘/path/to/weights.pt’)
Ignored if model_name uses schema prefix

device

str | torch.device

default:"cpu"

Device to load model on (‘cpu’, ‘cuda’, etc.)

precision

str

default:"fp32"

Model precision: ‘fp32’, ‘fp16’, ‘bf16’, ‘pure_fp16’, ‘pure_bf16’

jit

bool

default:"False"

Whether to JIT compile the model

force_image_size

int | Tuple[int, int]

Override default image size for the model

cache_dir

str

Directory for caching downloaded weights

Loading Schemas

HuggingFace Hub

Load models directly from HuggingFace Hub using the hf-hub: schema:

model = open_clip.create_model(
    'hf-hub:laion/CLIP-ViT-L-14-DataComp.XL-s13B-b90K',
    device='cuda'
)

The function automatically:

Downloads open_clip_config.json from the repo
Looks for weights files (.safetensors, .bin, .pth)
Merges preprocessing configuration

Local Directory

Load from a local directory containing model config and weights:

model = open_clip.create_model(
    'local-dir:/path/to/my/model',
    device='cuda'
)

Local directory must contain:

open_clip_config.json with model configuration
Weight file (searched in order): open_clip_model.safetensors, pytorch_model.bin, model.pth, etc.

Local File Path

Load weights from a specific file:

model = open_clip.create_model(
    'ViT-B-32',
    pretrained='/path/to/checkpoint.pt',
    device='cuda'
)

Advanced Loading Options

Tower-Specific Weights

Load separate weights for image and text towers:

model = open_clip.create_model(
    'ViT-B-32',
    pretrained_image=True,  # Load default ImageNet weights
    pretrained_text=True,   # Load default LM weights
    pretrained_image_path='/path/to/vision.pt',  # Override with custom weights
    pretrained_text_path='/path/to/text.pt'
)

pretrained_image

bool

default:"False"

Load default pretrained weights for image tower (timm models)

pretrained_text

bool

default:"True"

Load default pretrained weights for text tower (HuggingFace models)

pretrained_image_path

str

Path to custom image tower weights (loaded after full model)

pretrained_text_path

str

Path to custom text tower weights (loaded after full model)

Custom Model Configuration

Override model architecture parameters:

model = open_clip.create_model(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    force_quick_gelu=True,
    force_patch_dropout=0.5,
    force_image_size=336,
    force_context_length=128
)

create_model_and_transforms()

Convenience function that returns model with preprocessing transforms:

model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    device='cuda',
    precision='fp16'
)

# Use transforms
from PIL import Image
image = Image.open('example.jpg')
image_tensor = preprocess_val(image)

Returns a tuple of (model, train_transform, val_transform). The transforms handle:

Image resizing and cropping
Normalization with correct mean/std
Data augmentation (training only)

Always use model.eval() before inference. Models are in training mode by default, which affects layers like BatchNorm.

create_model_from_pretrained()

Strictly requires pretrained weights (raises error if weights can’t be loaded):

model, preprocess = open_clip.create_model_from_pretrained(
    'ViT-B-32',
    pretrained='laion2b_s34b_b79k',
    device='cuda',
    return_transform=True
)

return_transform

bool

default:"True"

Whether to return preprocessing transform. If False, returns only model.

This is the recommended function for inference use cases where pretrained weights are essential.

Listing Available Models

import open_clip

# List all model architectures
architectures = open_clip.list_models()
print(architectures)  # ['RN50', 'RN101', 'ViT-B-32', 'ViT-L-14', ...]

# List all pretrained weights
pretrained = open_clip.list_pretrained()
for model_name, tag in pretrained:
    print(f"{model_name}:{tag}")

# List pretrained weights as strings
pretrained_str = open_clip.list_pretrained(as_str=True)
# ['RN50:openai', 'RN50:yfcc15m', 'ViT-B-32:laion2b_s34b_b79k', ...]

Weight Loading Options

load_weights

bool

default:"True"

Whether to load the resolved pretrained weights. Set to False for random initialization.

require_pretrained

bool

default:"False"

Raise error if pretrained weights cannot be loaded

weights_only

bool

default:"True"

Use weights_only=True for torch.load (safer, prevents arbitrary code execution)

Complete Example

import torch
import open_clip
from PIL import Image

# Load model with transforms
model, _, preprocess = open_clip.create_model_and_transforms(
    'ViT-L-14',
    pretrained='datacomp_xl_s13b_b90k',
    device='cuda',
    precision='fp16',
    force_image_size=224
)
model.eval()

# Get tokenizer
tokenizer = open_clip.get_tokenizer('ViT-L-14')

# Prepare inputs
image = preprocess(Image.open('cat.jpg')).unsqueeze(0).cuda()
text = tokenizer(["a cat", "a dog"]).cuda()

# Inference
with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    # Normalize features
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)
    
    # Compute similarity
    similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
    print("Similarity:", similarity)

Documentation Index

​Basic Model Loading

​create_model()

​Loading Schemas

​HuggingFace Hub

​Local Directory

​Local File Path

​Advanced Loading Options

​Tower-Specific Weights

​Custom Model Configuration

​create_model_and_transforms()

​create_model_from_pretrained()

​Listing Available Models

​Weight Loading Options

​Complete Example

Basic Model Loading

create_model()

Loading Schemas

HuggingFace Hub

Local Directory

Local File Path

Advanced Loading Options

Tower-Specific Weights

Custom Model Configuration

create_model_and_transforms()

create_model_from_pretrained()

Listing Available Models

Weight Loading Options

Complete Example